Suffix tree-based approach to detecting duplications in sequence diagrams

نویسندگان

  • Hui Liu
  • Zhendong Niu
  • Zhiyi Ma
  • Weizhong Shao
چکیده

Models are core artefacts in software development and maintenance. Consequently, quality of models, especially maintainability and extensibility, becomes a big concern for most non-trivial applications. For some reasons, software models usually contain some duplications. These duplications had better be detected and removed because the duplications may reduce maintainability, extensibility and reusability of models. As an initial attempt to address the issue, the author propose an approach in this study to detecting duplications in sequence diagrams. With special preprocessing, the author convert 2dimensional (2-D) sequence diagrams into an 1-D array. Then the author construct a suffix tree for the array. With the suffix tree, duplications are detected and reported. To ensure that every duplication detected with the suffix tree can be extracted as a separate reusable sequence diagram, the author revise the traditional construction algorithm of suffix trees by proposing a special algorithm to detect the longest common prefixes of suffixes. The author also probe approaches to removing duplications. The proposed approach has been implemented in DuplicationDetector. With the implementation, the author evaluated the proposed approach on six industrial applications. Evaluation results suggest that the approach is effective in detecting duplications in sequence diagrams. The main contribution of the study is an approach to detecting duplications in sequence diagrams, a prototype implementation and an initial evaluation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clone Detection in UML Sequence Diagrams Using Token Based Approach

Model Based Development appears to progress extremely in large scale software companies. UML (Unified Modeling Language) is raising as an utility in software development. In object oriented development, the complete details for the lifecycle are provided by UML. UML is a standard modeling language, so that it is used for analysis, design and implementation of software based systems. Clone detec...

متن کامل

A Dynamic Approach to Weighted Suffix Tree Construction Algorithm

In present time weighted suffix tree is consider as a one of the most important existing data structure used for analyzing molecular weighted sequence. Although a static partitioning based parallel algorithm existed for the construction of weighted suffix tree, but for very long weighted DNA sequences it takes significant amount of time. However, in our implementation of dynamic partition based...

متن کامل

Ultra-fast Multiple Genome Sequence Matching Using GPU

In this paper, a contrastive evaluation of massively parallel implementations of suffix tree and suffix array to accelerate genome sequence matching are proposed based on Intel Core i7 3770K quad-core and NVIDIA GeForce GTX680 GPU. Besides suffix array only held approximately 20%∼30% of the space relative to suffix tree, the coalesced binary search and tile optimization make suffix array clearl...

متن کامل

A new algorithm for detecting low-complexity regions in protein sequences

MOTIVATION Pair-wise alignment of protein sequences and local similarity searches produce many false positives because of compositionally biased regions, also called low-complexity regions (LCRs), of amino acid residues. Masking and filtering such regions significantly improves the reliability of homology searches and, consequently, functional predictions. Most of the available algorithms are b...

متن کامل

CLUSEQ: Efficient and Effective Sequence Clustering

Analyzing sequence data has become increasingly important recently in the area of biological sequences, text documents, web access logs, etc. In this paper, we investigate the problem of clustering sequences based on their structural features. As a widely recognized technique, clustering has proven to be very useful in detecting unknown object categories and revealing hidden correlations among ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IET Software

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2011